Producing Accurate Interpretable Clusters from High-Dimensional Data
نویسندگان
چکیده
The primary goal of cluster analysis is to produce clusters that accurately reflect the natural groupings in the data. A second objective that is important for high-dimensional data is to identify features that are descriptive of the clusters. In addition to these requirements, we often wish to allow objects to be associated with more than one cluster. In this paper we present a technique, based on the spectral co-clustering model, that is effective in meeting these objectives. Our evaluation on a range of text clustering problems shows that the proposed method yields accuracy superior to that afforded by existing techniques, while producing cluster descriptions that are amenable to human interpretation.
منابع مشابه
Efficient high dimension data clustering using constraint-partitioning k-means algorithm
With the ever-increasing size of data, clustering of large dimensional databases poses a demanding task that should satisfy both the requirements of the computation efficiency and result quality. In order to achieve both tasks, clustering of feature space rather than the original data space has received importance among the data mining researchers. Accordingly, we performed data clustering of h...
متن کاملPrediction-Constrained Topic Models for Antidepressant Recommendation
Supervisory signals can help topic models discover low-dimensional data representations that are more interpretable for clinical tasks. We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels. Existing approaches fail to balance these goals by ...
متن کاملInterpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model
We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if . . . then. . . statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generati...
متن کاملA Least Squares Approach to Estimating the Average Reservoir Pressure
Least squares method (LSM) is an accurate and rapid method for solving some analytical and numerical problems. This method can be used to estimate the average reservoir pressure in well test analysis. In fact, it may be employed to estimate parameters such as permeability (k) and pore volume (Vp). Regarding this point, buildup, drawdown, late transient test data, modified Muskat method, interfe...
متن کاملWarped Mixtures for Nonparametric Cluster Shapes
A mixture of Gaussians fit to a single curved or heavy-tailed cluster will report that the data contains many clusters. To produce more appropriate clusterings, we introduce a model which warps a latent mixture of Gaussians to produce nonparametric cluster shapes. The possibly low-dimensional latent mixture model allows us to summarize the properties of the high-dimensional clusters (or density...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005